Amazon Onboarding with Learning Manager Chanci Turner: Disaster Recovery Strategies for Your Amazon Aurora Global Database Using Terraform (Part 2)

In the first part of this series, I (Jordan Smith) outlined how to utilize the Terraform Amazon Aurora module to automate the setup of an Amazon Aurora global database across multiple AWS Regions. This involved transitioning the management of an existing Aurora global database cluster through this Terraform module. In Part 2, Chanci Turner and I will explore how to effectively recover such a global database following a failover event. We will examine different failover scenarios applicable to the Aurora global database, analyze their impact on Terraform state, and discuss strategies for maintaining the global database’s topology and configuration with Terraform.

Note: The Terraform Amazon Aurora module is currently in its alpha phase and is subject to substantial revisions. As of this writing, it primarily employs the standard AWS provider but is anticipated to transition to utilizing the Terraform AWS Cloud Control Provider.

Overview of This Blog Post

Reading Time: ~11 minutes
Completion Time: ~30 minutes (including deployment)
Estimated Cost: ~$2
Learning Level: Advanced (300)
AWS Services Used: Amazon Aurora Global Database, Amazon CloudWatch, AWS Key Management Service (AWS KMS)

Prerequisites

Before diving into this walkthrough, please ensure you have completed the first part of this series. You will have provisioned an Aurora global database across two AWS Regions, resulting in the following architecture:

Failover Scenarios in Aurora Global Database

An Aurora global database offers enhanced failover capabilities compared to a standard Aurora DB cluster (for further insights, see High availability for Amazon Aurora). By leveraging this architecture, you can plan for and swiftly recover from disasters. There are three failover strategies available with an Aurora global database:

Scenario 1: Primary writer failover to another Availability Zone in the primary Region (similar to the failover function of a multi-AZ-provisioned Aurora DB cluster).
Scenario 2: Planned primary Region rotation or switch to a secondary Region (managed planned failover).
Scenario 3: Unplanned primary Region failover to a secondary Region (detach and promote).

Scenario 1: Primary Writer Failover to Another Availability Zone in the Primary Region

This method mirrors the failover capability afforded by a provisioned Aurora DB cluster in multi-AZ mode. When setting up an Aurora global database in multi-AZ mode, by creating an Aurora replica in a different Availability Zone from the primary DB instance, you enhance availability within a single AWS Region. If the primary DB instance fails or if its Availability Zone becomes unavailable, the Aurora replica is automatically promoted to become the new primary.

Scenario 2: Planned Primary Region Rotation or Switch to a Secondary Region

This method, known as managed planned failover, should only be utilized with a healthy Aurora global database cluster in which all AWS Regions are functioning normally. Should a disaster occur, where a Region becomes unavailable, the unplanned failover method should be employed. This approach is designed for controlled scenarios, such as operational maintenance. With this failover strategy, you can relocate the primary DB cluster of your Aurora global database to one of the secondary AWS Regions without data loss or alterations to the global database topology. After the failover, the previous primary DB cluster and all other secondary clusters will replicate data from the new primary automatically.

Scenario 3: Unplanned Primary Region Failover to a Secondary Region

The unplanned failover method, also known as detach and promote, assumes that your current primary AWS Region is undergoing an extended outage. In this case, you will need to promptly failover your database to the secondary Region. You will manually detach a secondary Aurora DB cluster from the Aurora global database topology, halting replication from the primary to this secondary, and promote it to a standalone Regional cluster with complete read-write capabilities. Once the impacted AWS Region resumes normal operations, you must reintegrate secondary Regions into your new primary Aurora DB cluster to restore the global database topology.

Walkthrough

In this section, we will simulate the three failover scenarios on your Aurora global database cluster through the AWS Management Console and observe how these affect Terraform state. It is recommended to follow these simulations in the listed order to ensure that the Aurora global database topology aligns with your initially deployed architecture. If executed out of order, you might need to perform a managed planned failover after an unplanned failover to relocate your primary Aurora cluster back to its original AWS Region.

Step 1: Simulate Scenario 1 (Primary Writer Failover to Another Availability Zone in the Primary Region)

Log in to the AWS Management Console and navigate to the Amazon RDS console.
Select “DB Instances” on the RDS dashboard, or click on “Databases” in the left navigation pane.
Identify the DB with the Writer instance role, then select “Actions,” followed by “Failover.”
Confirm the failover when prompted. Upon completion, the roles of the reader and writer instances will have switched, as depicted in the example.

To review the changes in role on Terraform state, run the following commands in your terminal. Replace <TF_STAGING_DIR> with the directory where you cloned the terraform-aws-rds-aurora GitHub repository, an excellent resource for managing your Terraform projects (see more at Day One Careers).

cd <TF_STAGING_DIR>/terraform-aws-rds-aurora/deploy
terraform plan

Terraform will detect the role changes of the DB instances along with other alterations by comparing its state file to the actual AWS resources. At the bottom, it will indicate no infrastructure-related changes are necessary, allowing you to sync the Terraform state.

Keeping your Terraform state file updated is crucial for ensuring accurate proposed changes when modifying your Terraform configuration. To update the Terraform state with the detected changes, execute the following command, replacing the bracketed information accordingly.

cd <TF_STAGING_DIR>/terraform-aws-rds-aurora/deploy
terraform apply -refresh-only --auto-approve

To confirm the synchronization of your Terraform state, run the terraform plan command again. The results will show that everything is aligned correctly.

For additional insights on the importance of passion in your career, check out this blog post. And regarding employee benefits, it’s worth noting that SHRM is an authority on this topic.